Project DAND - Red Wine by Hoi Rim Chung

Univariate Plots Section

This Exploratory Data Analysis is on quality of red wines and what influences its quality. I am keen to find out which chemicals affect the most on final quality ratings of red wines, 1 to 10. This will eventually help what factors to look out for when choosing a quality red wine. The data was obtained through Udacity Data Analysis Nanodegree website. However, it is available in various internet sources including Kaggle.

Number of data (1599) and number of variables (13)

## [1] 1599   13

Names of columns, including quality and measure of different chemicals as variables

##  [1] "X"                    "fixed.acidity"        "volatile.acidity"    
##  [4] "citric.acid"          "residual.sugar"       "chlorides"           
##  [7] "free.sulfur.dioxide"  "total.sulfur.dioxide" "density"             
## [10] "pH"                   "sulphates"            "alcohol"             
## [13] "quality"

Summary including min, max, 1st and 3rd Qs etc for each variable

##        X          fixed.acidity   volatile.acidity  citric.acid   
##  Min.   :   1.0   Min.   : 4.60   Min.   :0.1200   Min.   :0.000  
##  1st Qu.: 400.5   1st Qu.: 7.10   1st Qu.:0.3900   1st Qu.:0.090  
##  Median : 800.0   Median : 7.90   Median :0.5200   Median :0.260  
##  Mean   : 800.0   Mean   : 8.32   Mean   :0.5278   Mean   :0.271  
##  3rd Qu.:1199.5   3rd Qu.: 9.20   3rd Qu.:0.6400   3rd Qu.:0.420  
##  Max.   :1599.0   Max.   :15.90   Max.   :1.5800   Max.   :1.000  
##  residual.sugar     chlorides       free.sulfur.dioxide
##  Min.   : 0.900   Min.   :0.01200   Min.   : 1.00      
##  1st Qu.: 1.900   1st Qu.:0.07000   1st Qu.: 7.00      
##  Median : 2.200   Median :0.07900   Median :14.00      
##  Mean   : 2.539   Mean   :0.08747   Mean   :15.87      
##  3rd Qu.: 2.600   3rd Qu.:0.09000   3rd Qu.:21.00      
##  Max.   :15.500   Max.   :0.61100   Max.   :72.00      
##  total.sulfur.dioxide    density             pH          sulphates     
##  Min.   :  6.00       Min.   :0.9901   Min.   :2.740   Min.   :0.3300  
##  1st Qu.: 22.00       1st Qu.:0.9956   1st Qu.:3.210   1st Qu.:0.5500  
##  Median : 38.00       Median :0.9968   Median :3.310   Median :0.6200  
##  Mean   : 46.47       Mean   :0.9967   Mean   :3.311   Mean   :0.6581  
##  3rd Qu.: 62.00       3rd Qu.:0.9978   3rd Qu.:3.400   3rd Qu.:0.7300  
##  Max.   :289.00       Max.   :1.0037   Max.   :4.010   Max.   :2.0000  
##     alcohol         quality     
##  Min.   : 8.40   Min.   :3.000  
##  1st Qu.: 9.50   1st Qu.:5.000  
##  Median :10.20   Median :6.000  
##  Mean   :10.42   Mean   :5.636  
##  3rd Qu.:11.10   3rd Qu.:6.000  
##  Max.   :14.90   Max.   :8.000

Further study about each variable formats. Ensuring what may be needed and what may be not, which variable’s factor may need to be changed. For example, the column ‘X’ will be removed since this is mearly an index and does not add any value. Also the ‘quality’ ratings which is in int, will be utilised to add a new column called ‘quality_rating’.1 to 4 will be Bad, 5 to 6 will be Average and 7 to 10 will be Good (If 0 to 2 was available, they would have been marked as Worst Similarly, if 9 and 10 was available, they would have been marked as Excellent).

## 'data.frame':    1599 obs. of  13 variables:
##  $ X                   : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ fixed.acidity       : num  7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
##  $ volatile.acidity    : num  0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
##  $ citric.acid         : num  0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
##  $ residual.sugar      : num  1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
##  $ chlorides           : num  0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
##  $ free.sulfur.dioxide : num  11 25 15 17 11 13 15 15 9 17 ...
##  $ total.sulfur.dioxide: num  34 67 54 60 34 40 59 21 18 102 ...
##  $ density             : num  0.998 0.997 0.997 0.998 0.998 ...
##  $ pH                  : num  3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
##  $ sulphates           : num  0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
##  $ alcohol             : num  9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
##  $ quality             : int  5 5 5 6 5 5 5 7 7 5 ...

Distribution of ratings of red wines in numbers Actual number of each rating

## 
##   3   4   5   6   7   8 
##  10  53 681 638 199  18

Below is a bar graph of ratings in classification of Bad, Average and Good. It is noted that majority is having Average rating. This may affect the quality of meaningful exploratory data analysis on wine with ratings of Bad and Good. Count of each rating

## 
##     Bad Average    Good 
##      63    1319     217

Fixed acidity is one of the key chemicals that determines the quality of red wine. It shows max value at 7.2 and is skewed to the right.

Some outliers for this and the rest of plots are removed.

## 
##  4.6  4.7  4.9    5  5.1  5.2  5.3  5.4  5.5  5.6  5.7  5.8  5.9    6  6.1 
##    1    1    1    6    4    6    4    5    1   14    2    4    9   13   16 
##  6.2  6.3  6.4  6.5  6.6  6.7  6.8  6.9    7  7.1  7.2  7.3  7.4  7.5  7.6 
##   20   14   25   17   37   28   46   38   50   57   67   44   44   52   46 
##  7.7  7.8  7.9    8  8.1  8.2  8.3  8.4  8.5  8.6  8.7  8.8  8.9    9  9.1 
##   49   53   42   42   26   45   40   26   19   27   24   34   33   26   29 
##  9.2  9.3  9.4  9.5  9.6  9.7  9.8  9.9   10 10.1 10.2 10.3 10.4 10.5 10.6 
##   16   22   17   14   17    9   15   26   23   10   19   11   21   12   14 
## 10.7 10.8 10.9   11 11.1 11.2 11.3 11.4 11.5 11.6 11.7 11.8 11.9   12 12.1 
##   10   10    8    3    9    5    7    5   13   12    3    3   12    7    1 
## 12.2 12.3 12.4 12.5 12.6 12.7 12.8 12.9   13 13.2 13.3 13.4 13.5 13.7 13.8 
##    4    5    4    7    4    4    5    2    3    3    3    1    1    2    1 
##   14 14.3   15 15.5 15.6 15.9 
##    1    1    2    2    2    1

Volatile acidity is also one of the important acidity determining the quality of red wine. This also is skewed to the right but reasonably well bell-shaped. Citiric acidity is the third acidity determining the quality of red wine. It shows shape with three highlights - near 0, 0.22 and 0.45.

## 
##    0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09  0.1 0.11 0.12 0.13 0.14 
##  132   33   50   30   29   20   24   22   33   30   35   15   27   18   21 
## 0.15 0.16 0.17 0.18 0.19  0.2 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 
##   19    9   16   22   21   25   33   27   25   51   27   38   20   19   21 
##  0.3 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39  0.4 0.41 0.42 0.43 0.44 
##   30   30   32   25   24   13   20   19   14   28   29   16   29   15   23 
## 0.45 0.46 0.47 0.48 0.49  0.5 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 
##   22   19   18   23   68   20   13   17   14   13   12    8    9    9    8 
##  0.6 0.61 0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69  0.7 0.71 0.72 0.73 0.74 
##    9    2    1   10    9    7   14    2   11    4    2    1    1    3    4 
## 0.75 0.76 0.78 0.79    1 
##    1    3    1    1    1

Regular sugar has a very long tail with skewed to the right shape of distribution. The max value is at 2.1.

## 
##  0.9  1.2  1.3  1.4  1.5  1.6 1.65  1.7 1.75  1.8  1.9    2 2.05  2.1 2.15 
##    2    8    5   35   30   58    2   76    2  129  117  156    2  128    2 
##  2.2 2.25  2.3 2.35  2.4  2.5 2.55  2.6 2.65  2.7  2.8 2.85  2.9 2.95    3 
##  131    1  109    1   86   84    1   79    1   39   49    1   24    1   25 
##  3.1  3.2  3.3  3.4 3.45  3.5  3.6 3.65  3.7 3.75  3.8  3.9    4  4.1  4.2 
##    7   15   11   15    1    2    8    1    4    1    8    6   11    6    5 
## 4.25  4.3  4.4  4.5  4.6 4.65  4.7  4.8    5  5.1 5.15  5.2  5.4  5.5  5.6 
##    1    8    4    4    6    2    1    3    1    5    1    3    1    8    6 
##  5.7  5.8  5.9    6  6.1  6.2  6.3  6.4 6.55  6.6  6.7    7  7.2  7.3  7.5 
##    1    4    3    4    4    3    2    3    2    2    2    1    1    1    1 
##  7.8  7.9  8.1  8.3  8.6  8.8  8.9    9 10.7   11 12.9 13.4 13.8 13.9 15.4 
##    2    3    2    3    1    2    1    1    1    2    1    1    2    1    2 
## 15.5 
##    1

Chloroides also is skewed to the right with a long tail on the right. The max value is at 0.074.

## 
## 0.012 0.034 0.038 0.039 0.041 0.042 0.043 0.044 0.045 0.046 0.047 0.048 
##     2     1     2     4     4     3     1     5     4     4     4     8 
## 0.049  0.05 0.051 0.052 0.053 0.054 0.055 0.056 0.057 0.058 0.059  0.06 
##     8    12     1    10     5    13     8     9    10    14    17    16 
## 0.061 0.062 0.063 0.064 0.065 0.066 0.067 0.068 0.069  0.07 0.071 0.072 
##    11    24    22    20    23    32    27    30    21    35    47    24 
## 0.073 0.074 0.075 0.076 0.077 0.078 0.079  0.08 0.081 0.082 0.083 0.084 
##    35    55    45    51    47    51    43    66    40    46    35    49 
## 0.085 0.086 0.087 0.088 0.089  0.09 0.091 0.092 0.093 0.094 0.095 0.096 
##    25    31    25    32    25    21    19    22    21    19    23    18 
## 0.097 0.098 0.099   0.1 0.101 0.102 0.103 0.104 0.105 0.106 0.107 0.108 
##    18    12     8    13     5    10     7    16     6     8     9     1 
## 0.109  0.11 0.111 0.112 0.113 0.114 0.115 0.116 0.117 0.118 0.119  0.12 
##     3     8     7     6     1    11     5     2     4     8     3     3 
## 0.121 0.122 0.123 0.124 0.125 0.126 0.127 0.128 0.132 0.136 0.137 0.143 
##     2     7     6     3     1     1     1     1     4     1     1     1 
## 0.145 0.146 0.147 0.148 0.152 0.153 0.157 0.159 0.161 0.165 0.166 0.168 
##     1     1     1     1     2     1     3     1     1     1     3     1 
## 0.169  0.17 0.171 0.172 0.174 0.176 0.178 0.186  0.19 0.194   0.2 0.205 
##     1     1     2     1     1     1     2     1     1     1     1     2 
## 0.213 0.214 0.216 0.222 0.226  0.23 0.235 0.236 0.241 0.243  0.25 0.263 
##     1     3     1     1     2     1     1     1     1     1     1     1 
## 0.267  0.27 0.332 0.337 0.341 0.343 0.358  0.36 0.368 0.369 0.387 0.401 
##     1     1     1     1     1     1     1     1     1     1     1     1 
## 0.403 0.413 0.414 0.415 0.422 0.464 0.467  0.61 0.611 
##     1     1     2     3     1     1     1     1     1

Free sulfur is maxed at 5, skewed to the right

## 
##    1    2    3    4    5  5.5    6    7    8    9   10   11   12   13   14 
##    3    1   49   41  104    1  138   71   56   62   79   59   75   57   50 
##   15   16   17   18   19   20   21   22   23   24   25   26   27   28   29 
##   78   61   60   46   39   30   41   22   32   34   24   32   29   23   23 
##   30   31   32   33   34   35   36   37 37.5   38   39   40 40.5   41   42 
##   16   20   22   11   18   15   11    3    2    9    5    6    1    7    3 
##   43   45   46   47   48   50   51   52   53   54   55   57   66   68   72 
##    3    3    1    1    4    2    4    3    1    1    2    1    1    2    1

Total sulfur also shows skewed to the right. Density is probably the only well normally distributed compared to others. Overall distribution of pH is between 2.9 and 3.7. This shows all the red wine are very acidic. Sulphates is skewed to the right, with max value just under 0.6. Alcohol level shows very concentrated at around 9.5.

Univariate Analysis

What is the structure of your dataset?

There are 1599 wines in the dataset with 12 features (13 variables but the first variable X is just a count). Features are fixed.acidity, volatile.acidity, citric.acid, residual.sugar, chlorides, free.sulfur.dioxide, total.sulfur.dioxide, density, pH, sulphates, alcohol and quality.

All factors are numbers in either double/float or integer format, except ‘quality’ which is integer that was changed to factor. This is due to the fact that ‘quality’ is classification, not a measure.

Observations: 1) Quality (10 being best, 0 being worst): This dataset contains categories only from 3-8. Highest is 5, with 681 out of 1599. Followed by 6, with 638 and then 7 with 199. Lowest was 3 with 10. 75% has quality better than 5 (mid-point) and top 25% has quality 6 or better (6 is median quality) 2) Fixed acidity: Highest at 7.2 3) Volatile acidity: Interquartile range of 0.39 to 0.64. 4) pH: Ideal range of pH is around 3.2 and 3.7 (refernce). Interquartile is 3.21 and 3.4, which is ideal. 25% is between 3.4 and 4.01.

What is/are the main feature(s) of interest in your dataset?

I would like to find out the relationship between the level of acidity and the quality. I would also like to find out how the combintion of acidity and other factors affect the overall quality of the red wine.

What other features in the dataset do you think will help support your
investigation into your feature(s) of interest?

Residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, sulphates and alcohol. But mainly pH.

Did you create any new variables from existing variables in the dataset?

A new variables under the column of ‘quality_rating’ is created. This is to group the qualities into three: 1) 0 to 4 : Bad 2) 5 to 6 : Average 3) 7 to 10 : Good

Of the features you investigated, were there any unusual distributions?
Did you perform any operations on the data to tidy, adjust, or change the form
of the data? If so, why did you do this?

In many graph analysis, outliers are removed. This is to ensure the graphs are more focusing on values it is concentrating. Once outliers are removed, it seems less skewed.

Bivariate Plots Section

Quick correlation snapshot, 0 to 0.39 is week, 0.4 to 0.59 is moderateand 0.6 to 1 is strong

Some shows stronger correlations than others such as: 1) Quality and Alcohol : 0.5, Moderate 2) Quality and Volatile acidity : -0.4, Moderate There are other combinations that shows strong correlation (either positive or negative), however if they are not quality related, I am not focussing for this EDA.

## 
##  Pearson's product-moment correlation
## 
## data:  redwine_pf$quality and redwine_pf$alcohol
## t = 21.639, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4373540 0.5132081
## sample estimates:
##       cor 
## 0.4761663

As the correlation figure showed above (0.4761), as the quality goes up, the alcohol level goes up as well. There shows a slight drop from quality 4 to 5, however, this might be due to the fact that the dataset do not have enough samples for red wines with quality 3 and 4 (7 and 8 also do not have many samples but more than 3 and 4), it may have been distorted.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.40    9.60   10.00   10.22   11.00   13.10

The above shows summary of ‘Bad; quality wines in terms of alcohol. Both mean and median is lower than the ’Good’ quality wines. The max value is lower than ‘Average’ value.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.40    9.50   10.00   10.25   10.90   14.90

Although median is same as ‘Bad’ quality wines, mean is higher (possibly due to higher max value)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    9.20   10.80   11.60   11.52   12.20   14.00

Both median and mean is higher than ‘Bad’ and ‘Average’ alcohols. Interestingly, max value is lower. Graphical representation of above.

Another factor that strongly affects the quality, which is volatile acidity.

## 
##  Pearson's product-moment correlation
## 
## data:  redwine_pf$quality and redwine_pf$volatile.acidity
## t = -16.954, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.4313210 -0.3482032
## sample estimates:
##        cor 
## -0.3905578

This shows gentle downwards trend, which proves its negative correlation between volatil acidity and quality.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.2300  0.5650  0.6800  0.7242  0.8825  1.5800
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1600  0.4100  0.5400  0.5386  0.6400  1.3300
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1200  0.3000  0.3700  0.4055  0.4900  0.9150

Although not as strong, sulphates is another chemical that also shows reasonably stronger correlation than other chemicals by showing 0.251 positive correlation. As expected, as the quality increases, the sulphates level also goes up. Once again, I would expect steaper curve, if there were as much data for quality 3 & 4 (‘Bad’) and 7 & 8 (‘Good’).

## 
##  Pearson's product-moment correlation
## 
## data:  redwine_pf$quality and redwine_pf$sulphates
## t = 10.38, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2049011 0.2967610
## sample estimates:
##       cor 
## 0.2513971

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.3300  0.4950  0.5600  0.5922  0.6000  2.0000
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.3700  0.5400  0.6100  0.6473  0.7000  1.9800
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.3900  0.6500  0.7400  0.7435  0.8200  1.3600

The below shows correlation of three different acid level with quality. As they all represent ‘acidity’, I will later combine them, then compare. However, one chemical shows interesting result is the ‘volatile acidity’. Unlike the other two, it shows positive correlation with quality. This may need further investigation.

## 
##  Pearson's product-moment correlation
## 
## data:  redwine_pf$pH and redwine_pf$volatile.acidity
## t = 9.659, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1880823 0.2807254
## sample estimates:
##       cor 
## 0.2349373
## 
##  Pearson's product-moment correlation
## 
## data:  redwine_pf$pH and redwine_pf$fixed.acidity
## t = -37.366, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.7082857 -0.6559174
## sample estimates:
##        cor 
## -0.6829782
## 
##  Pearson's product-moment correlation
## 
## data:  redwine_pf$pH and redwine_pf$citric.acid
## t = -25.767, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.5756337 -0.5063336
## sample estimates:
##        cor 
## -0.5419041

Hence, I tried Simpson’s paradox between volatile acidity and pH level. Simpson’s paradox is, a certain relationship shows for most of similar group, however it disappears with the other certain small group of data.

Bivariate Analysis

Talk about some of the relationships you observed in this part of the
investigation. How did the feature(s) of interest vary with other features in
the dataset?

Standing out variables with quality in terms of correlation (according to the correlation table) are alcohol and volatile acid. Alcohol shows est. 0.5 positive correlation, which means the more alcohol the better quality. However, have to bear in mind that it is still a moderate positive correlation instead of strong or very strong correlation. Having boxplot for each quality groups (0-4, 5-6 and 7-10), the highest group definitely shows much higher level of alcohol, compared to the other two groups. Volatile acidity shows 0.4 negative correlation, which means the less volatile acidity the better quality of red wine. Having boxplot, it definitely shows downwards trends of the amount of volatile acidity, as the quality goes up. If any, I would look at sulphates level, which shows 0.3 positive correlation towards quality.

Did you observe any interesting relationships between the other features
(not the main feature(s) of interest)?

Alcohol and density shows negative correlation of circa 0.5. pH and fixed acidity shows negative correlation of circa 0.7. Density and fixed acidity shows positive correlation of circa 0.7. Citric acidity and volatile acidity shows negative correlation of circa 0.6. Citric acidity and fixed acidity shows positive correlation of circa 0.7. Volatile acidity and fixed acidity shows negaive correlation of circa 0.3.

What was the strongest relationship you found?

Alcohol and volatile acidity shows the strongest relationship with quality.

Multivariate Plots Section

##  [1] "fixed.acidity"        "volatile.acidity"     "citric.acid"         
##  [4] "residual.sugar"       "chlorides"            "free.sulfur.dioxide" 
##  [7] "total.sulfur.dioxide" "density"              "pH"                  
## [10] "sulphates"            "alcohol"              "quality"             
## [13] "total.acidity"

Multivariate Analysis

Talk about some of the relationships you observed in this part of the
investigation. Were there features that strengthened each other in terms of
looking at your feature(s) of interest?

The most related variables are acidity, alcohol and quality. I have added all the acidities (volatile, fixed and citric) and compared with increasing alcohol level, which is highly correlated to overall quality. With exception due to outliers for average quality, it demonstrates downwards trend of total acidity, as the alcohol and quality increases.

Were there any interesting or surprising interactions between features?

Total acidity level seems to be starting at higher level for each quality rating categories.

OPTIONAL: Did you create any models with your dataset? Discuss the
strengths and limitations of your model.

N/A


Final Plots and Summary

This EDA (Experimental Data Analysis) is to understand which chemical(s) would affect the overall quality of red wines. The original dataset contains 13 variables with 1599 observations. The variables contains different measurements including alcohol, pH, sulphates etc. It also contains the overall quality of red wine in the scale of 0 to 10 (0 being worst and 10 being best).

Plot One

Description One

Due to very concentrated dataset for the average quality (5 and 6, which is 82.49% of total dataset, whereas bad quality is 3.9% and 13.57% of good quality), the EDA would not provide very accurate and generic investigation for different quality categories. This should be taken into consideration around how meaningful analysis could be obtained on wines other than average quality (5 and 6 score).

Plot Two

Description Two

Alcohol is one of the variable which determines overall quality of the wines. It was observed that there is a general positive correlation between the amount of alcohol and overall quality of red wine and the percentage of alcohol mostly varies in between 9 and 13, according to this dataset.

Plot Three

Description Three

Another factor that affects the quality of red wine in addition to alcohol, is the acidity level. The dataset contains acidity level for citric, fixed and volatile. I have combined these three and created a new variable called ‘Total acidity’. Considering there are some outliers for wines with ‘average’ quality rating, the general trend shows the less acidity, the better wine quality. ——

Reflection

As my very first project in R, it has been a great pleasure to learn how easy it is yet very powerful tool to analyse the data. The graphical tool was very strong and fast, easy to add elements that will help me to understand the underlying message behind the numbers within the dataset. Depite my limitation in being able to have more factors added to the graphs and plots, I managed to understand the variables that affects the overall quality of wines, which are alcohol and acidity. What could have been better is to have full dataset with equal number of data elements for each quality rating score. It would have helped me to explore the data with more accuracy to understand how each variable helps increase or reduce the quality rating. Lastly, understanding and being able to enjoy wine more, would have helped me to learn the link between each variable. In that note, I am happy to start learning more about the red wine and by understanding the industry, it will help me to explore this dataset with more insights from real world.